{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "75114978-3c80-4ac9-99db-df3c3fd74c25",
   "metadata": {},
   "source": [
    "# Molecule Replacement Using QMzyme\n",
    "## Objective\n",
    "\n",
    "The objective of this tutorial is to replace a currently present ligand/residue in a protein system with a substrate/mutation of choice.\n",
    "\n",
    "This workflow allows you to:\n",
    "\n",
    "- Align the heavy atoms of the desired molecule with the heavy atoms of the reference molecule\n",
    "- Replace the current molecule/residue with the desired molecule/residue\n",
    "\n",
    "It is recommended to first visit the QM-only calculation workflow due to the similarity in the workflow. You can find it [here](https://qmzyme.readthedocs.io/en/latest/Examples/QM-only%20Calculation.html).\n",
    "\n",
    "In this specific example, we are using ketosteroid isomerase (KSI) as the model system. The structure for KSI is obtained from the PDB [1OH0](https://doi.org/10.2210/pdb1OH0/pdb) and MM-minimized prior to this tutorial. In this PDB, KSI is bound to equilenin (EQU), which is an analogue of the dienolate reaction intermediate. Quite often, it is seen that resolved crystal structures contain non-reactive analogues; however, most QM studies aim to study the reaction path. Hence, it is necessary to replace the non-reactive analogue with the reactive substrate. In the case of KSI, we'll be replacing it with product analogue, 19NT. Since EQU is a reaction analogue, it can be assumed that these molecules will have similar binding poses. Hence, rather than going through a docking procedure, we only need a substrate molecule (which we made by modifying EQU), and then use QMzyme to align it to and replace EQU, thus providing us with the starting structure for the reactive species.\n",
    "\n",
    "## Classes used in this example\n",
    "\n",
    "- [Generate Model](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.GenerateModel.html)\n",
    "- [QM_method](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.CalculateModel.html#qm-treatment)\n",
    "- [SelectionSchemes](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.SelectionSchemes.html#)\n",
    "- [DistanceCutoff SelectionSchemes](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.SelectionSchemes.html#QMzyme.SelectionSchemes.DistanceCutoff) (optional)\n",
    "- [QMzymeRegion](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.QMzymeRegion.html)\n",
    "    - [align_to method](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.QMzymeRegion.html#QMzyme.QMzymeRegion.QMzymeRegion.align_to) \n",
    "    - [subtract method](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.QMzymeRegion.html#QMzyme.QMzymeRegion.QMzymeRegion.subtract)\n",
    "    - [combine method](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.QMzymeRegion.html#QMzyme.QMzymeRegion.QMzymeRegion.combine)\n",
    "- [CA_terminal TruncationSheme](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.TruncationSchemes.html#QMzyme.TruncationSchemes.CA_terminal)\n",
    "\n",
    "## Required Files\n",
    "\n",
    "To start, you will need:\n",
    "\n",
    "- A fully prepped and protonated PDB of the reference protein file with the ligand bound (if applicable).\n",
    "- A structure of the desired substrate (preferably generated from the currently bound ligand)/mutation (preferably in .PDB or .mol2 format).\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "e945fb89-c454-42eb-a813-3d6d4e230c5e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here are the necesary imports for this tutorial!\n",
    "\n",
    "import QMzyme\n",
    "from QMzyme import GenerateModel\n",
    "from QMzyme.SelectionSchemes import DistanceCutoff\n",
    "from QMzyme.data import PDB, LIG\n",
    "from QMzyme.RegionBuilder import RegionBuilder\n",
    "import MDAnalysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fd1886c-6028-4ff5-865d-1c12c9e6ac01",
   "metadata": {},
   "source": [
    "## Initialize Model and Define Regions\n",
    "\n",
    "First, we need to generate a model for the full protein–ligand complex using `GenerateModel()`. We update the charge of EQU with `residue_charge.update()`, where we can manually update the charge of an unknown residue. Afterwards, `set_catalytic_center()` is used to designate a catalytic center in QMzymeModel. At last, we will utilize `DistanceCutoff()` class to get a region containing ligand that needs to be replaced."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "cd093e91-e48f-4288-849c-917f1767873d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Charge information not present. QMzyme will try to guess region charges based on residue names consistent with AMBER naming conventions (i.e., aspartate: ASP --> Charge: -1, aspartic acid: ASH --> Charge: 0.). See QMzyme.data.residue_charges for the full set.\n"
     ]
    }
   ],
   "source": [
    "# We first initilize QMzymeModel by loading structure file.\n",
    "# The initial pdb file should be prepared prior (hydrogens must be present).\n",
    "model = QMzyme.GenerateModel(PDB)\n",
    "\n",
    "# This adds unknown residue charge.\n",
    "QMzyme.data.residue_charges.update({'EQU': -1}) # EQU ligand\n",
    "\n",
    "# We set our catalytic center and our region of interest using DistanceCutoff SelectionScheme.\n",
    "model.set_catalytic_center(selection='resname EQU and segid A')\n",
    "model.set_region(selection='resname EQU and segid A', name='old_ligand')\n",
    "model.set_region(selection=DistanceCutoff, cutoff=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cf1a185-2cc6-459b-a7e5-2dc1287ba9dc",
   "metadata": {},
   "source": [
    "## Acquire Replacement Region\n",
    "\n",
    "Here, we initialize the substrate model and create the region that corresponds to the atoms you want to include in the final model. This is done first by loading the replacement structure file as MDAnalysis universe and saving it as QMzymeRegion, which can be used in next step for alignment and replacement.\n",
    "\n",
    "*Note:* In this case, we are using the whole substrate PDB to create the region containing all the atoms that are to be substituted. However, in case there is a full protein structure, the region can be made by selecting the corresponding atoms, and the protocol remains the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "4b7b54ce-c88d-4550-85b4-85a38d5af019",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We import the ligand region and update charge.\n",
    "region_19nt = model.import_region(LIG, name='19nt')\n",
    "QMzyme.data.residue_charges.update({'6VW': 0})\n",
    "\n",
    "# Update the resid of the replacement region to the removed region.\n",
    "original_resid = model.old_ligand.atoms[0].resid\n",
    "region_19nt.atom_group.residues.resids = original_resid\n",
    "for atom in region_19nt.atoms:\n",
    "    atom.resid = original_resid"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11fa4d05-e968-4a47-acd9-52af7bc75bb5",
   "metadata": {},
   "source": [
    "## Aligning the Substrate to the Bound Ligand\n",
    "\n",
    "The `align_to()` method is present in the QMzymeRegion class. It can be used to align specific atoms and then update the aligned atoms to the region.The alignment procedure also reports RMSD values before and after alignment.\n",
    "\n",
    "⚠ **Important considerations:**\n",
    "\n",
    "- Align the **substrate (self)** to the **full protein–ligand complex (other)**.\n",
    "- Do **not** align the other way around.\n",
    "- This can be ensured by specifying the substrate model before the \"align_to\" function. This will assign the substrate model to \"self\".\n",
    "  \n",
    "This ensures:\n",
    "- The substrate region is updated to match the aligned coordinates.\n",
    "- The substrate moves into the position of the bound ligand.\n",
    "- The ligand does *not* move to the substrate coordinates.\n",
    "\n",
    "Two critical arguments `self_selection=` and `other_selection=` specify the atoms used for alignment. Make sure that the number of atoms in the reference (self) and in the target (other) are the same. If the number of atoms in the selections differs, then the alignment will fail.\n",
    "\n",
    "If your substrate PDB was generated independently (not modified directly from the bound ligand), ensure that corresponding heavy atoms are properly labeled. This will ensure that the correct atoms are getting aligned. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "287252fa-3dae-4a6b-b150-6de5a8efc2a4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RMSD before alignment: 106.6041030883789 Å\n",
      "RMSD after alignment: 0.26720352674071335 Å\n"
     ]
    }
   ],
   "source": [
    "# Align the new ligand to our old ligand.\n",
    "region_19nt.align_to(other=model.old_ligand,\n",
    "    self_selection=\"element C\", other_selection=\"element C\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25773e40-c30b-40d6-82ee-e806f049c5d9",
   "metadata": {},
   "source": [
    "## Replace the Currently Bound Ligand with the Aligned Substrate\n",
    "\n",
    "To replace the current ligand with our desired ligand, we have to create a region containing both our original DistanceCutoff region containing current ligand and our replacement region with `combine()`. Afterwards, we can remove our current ligand using `subtract()`, a method that removes the selected region from the combined region. Afterards, we will save the new region in our `model_ksi` using `set_region()` to include it to QMzymeModel for input file generation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "ce8fe37b-8137-4aeb-9485-01a22fa2fd6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Combine new ligand and protein in new QMzymeRegion.\n",
    "new_region = model.get_region(\"cutoff_5\")\n",
    "\n",
    "# Remove old ligand.\n",
    "new_region = new_region.subtract(model.old_ligand)\n",
    "\n",
    "# Add new ligand.\n",
    "new_region = new_region.combine(region_19nt)\n",
    "\n",
    "# Add new region to ksi_model.\n",
    "model.set_region(selection=new_region, name=\"ksi_19nt_cutoff5\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0287652-208a-4559-9a27-907651609208",
   "metadata": {},
   "source": [
    "<img src=\"../Images/before.png\" width=\"600\">\n",
    "\n",
    "This panel displays the original crystal active site structure. The non-reactive ligand remains bound inside our QM region, surrounded by the native amino acid environment captured during crystallography.\n",
    "\n",
    "<img src=\"../Images/after.png\" width=\"600\">\n",
    "\n",
    "This panel reveals the newly updated reactive complex. The old ligand has been replaced with the aligned product analogue. Notice how the new ligand is in similar location as our original ligand, suggesting that the ligand replacement has been successful!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bfb3f181-6bca-4da4-b710-3a67b99140de",
   "metadata": {},
   "source": [
    "## Write QM Input File\n",
    "\n",
    "Now that we have created our region of interest with new, replaced ligand, we can continue with rest of our QM workflow, where we will:\n",
    "\n",
    "- Fix alpha carbons\n",
    "- Define the QM method\n",
    "- Assign it to the newly created region\n",
    "- Truncate the model\n",
    "- Write the desired QM input file\n",
    "\n",
    "\n",
    "*Note:* For more details regarding the protocols used in this step, please refer to [QM-only Calculation](https://qmzyme.readthedocs.io/en/latest/Examples/QM-only%20Calculation.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f482158-52d8-4bdc-aa35-bbd0821340c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Selecting and freeznig CA atoms.\n",
    "ca_atoms = model.ksi_19nt_cutoff5.get_atoms(attribute=\"name\", value=\"CA\")\n",
    "model.ksi_19nt_cutoff5.set_fixed_atoms(atoms=ca_atoms)\n",
    "\n",
    "# We first create two QM method, each with desired level of theory.\n",
    "qm_method = QMzyme.QM_Method(\n",
    "    basis_set='6-31G*',\n",
    "    functional='wB97X-D3',\n",
    "    qm_input='OPT FREQ',\n",
    "    program='gaussian'\n",
    ")\n",
    "\n",
    "# Afterwards, we assign two methods to their own respective region.\n",
    "qm_method.assign_to_region(region=model.ksi_19nt_cutoff5)\n",
    "\n",
    "# We truncate the model with TerminalAlphaCarbon class.\n",
    "model.truncate()\n",
    "\n",
    "# Finally, we create input file for QM calculation.\n",
    "model.write_input()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46f4c191-849f-43d2-b03b-62b2bb977895",
   "metadata": {},
   "source": [
    "## References Cited\n",
    "\n",
    "The PyMOL Molecular Graphics System, Version 3.0 Schrödinger, LLC."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}